Goto

Collaborating Authors

 network reconstruction


Uncertainty quantification and posterior sampling for network reconstruction

Peixoto, Tiago P.

arXiv.org Machine Learning

Network reconstruction is the task of inferring the unseen interactions between elements of a system, based only on their behavior or dynamics. This inverse problem is in general ill-posed, and admits many solutions for the same observation. Nevertheless, the vast majority of statistical methods proposed for this task -- formulated as the inference of a graphical generative model -- can only produce a ``point estimate,'' i.e. a single network considered the most likely. In general, this can give only a limited characterization of the reconstruction, since uncertainties and competing answers cannot be conveyed, even if their probabilities are comparable, while being structurally different. In this work we present an efficient MCMC algorithm for sampling from posterior distributions of reconstructed networks, which is able to reveal the full population of answers for a given reconstruction problem, weighted according to their plausibilities. Our algorithm is general, since it does not rely on specific properties of particular generative models, and is specially suited for the inference of large and sparse networks, since in this case an iteration can be performed in time $O(N\log^2 N)$ for a network of $N$ nodes, instead of $O(N^2)$, as would be the case for a more naive approach. We demonstrate the suitability of our method in providing uncertainties and consensus of solutions (which provably increases the reconstruction accuracy) in a variety of synthetic and empirical cases.


Network reconstruction via the minimum description length principle

Peixoto, Tiago P.

arXiv.org Machine Learning

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.


Complex contagions can outperform simple contagions for network reconstruction with dense networks or saturated dynamics

Landry, Nicholas W., Thompson, William, Hébert-Dufresne, Laurent, Young, Jean-Gabriel

arXiv.org Machine Learning

Network scientists often use complex dynamic processes to describe network contagions, but tools for fitting contagion models typically assume simple dynamics. Here, we address this gap by developing a nonparametric method to reconstruct a network and dynamics from a series of node states, using a model that breaks the dichotomy between simple pairwise and complex neighborhood-based contagions. We then show that a network is more easily reconstructed when observed through the lens of complex contagions if it is dense or the dynamic saturates, and that simple contagions are better otherwise.


Scalable network reconstruction in subquadratic time

Peixoto, Tiago P.

arXiv.org Machine Learning

Network reconstruction consists in determining the unobserved pairwise couplings between $N$ nodes given only observational data on the resulting behavior that is conditioned on those couplings -- typically a time-series or independent samples from a graphical model. A major obstacle to the scalability of algorithms proposed for this problem is a seemingly unavoidable quadratic complexity of $O(N^2)$, corresponding to the requirement of each possible pairwise coupling being contemplated at least once, despite the fact that most networks of interest are sparse, with a number of non-zero couplings that is only $O(N)$. Here we present a general algorithm applicable to a broad range of reconstruction problems that achieves its result in subquadratic time, with a data-dependent complexity loosely upper bounded by $O(N^{3/2}\log N)$, but with a more typical log-linear complexity of $O(N\log^2N)$. Our algorithm relies on a stochastic second neighbor search that produces the best edge candidates with high probability, thus bypassing an exhaustive quadratic search. In practice, our algorithm achieves a performance that is many orders of magnitude faster than the quadratic baseline, allows for easy parallelization, and thus enables the reconstruction of networks with hundreds of thousands and even millions of nodes and edges.


A Systematic Evaluation of Node Embedding Robustness

Mara, Alexandru, Lijffijt, Jefrey, Günnemann, Stephan, De Bie, Tijl

arXiv.org Artificial Intelligence

Node embedding methods map network nodes to low dimensional vectors that can be subsequently used in a variety of downstream prediction tasks. The popularity of these methods has grown significantly in recent years, yet, their robustness to perturbations of the input data is still poorly understood. In this paper, we assess the empirical robustness of node embedding models to random and adversarial poisoning attacks. Our systematic evaluation covers representative embedding methods based on Skip-Gram, matrix factorization, and deep neural networks. We compare edge addition, deletion and rewiring attacks computed using network properties as well as node labels. We also investigate the performance of popular node classification attack baselines that assume full knowledge of the node labels. We report qualitative results via embedding visualization and quantitative results in terms of downstream node classification and network reconstruction performances. We find that node classification results are impacted more than network reconstruction ones, that degree-based and label-based attacks are on average the most damaging and that label heterophily can strongly influence attack performance.


A Bayesian Approach to Reconstructing Interdependent Infrastructure Networks from Cascading Failures

Wang, Yu, Yu, Jin-Zhu, Baroud, Hiba

arXiv.org Artificial Intelligence

Analyzing the behavior of complex interdependent networks requires complete information about the network topology and the interdependent links across networks. For many applications such as critical infrastructure systems, understanding network interdependencies is crucial to anticipate cascading failures and plan for disruptions. However, data on the topology of individual networks are often publicly unavailable due to privacy and security concerns. Additionally, interdependent links are often only revealed in the aftermath of a disruption as a result of cascading failures. We propose a scalable nonparametric Bayesian approach to reconstruct the topology of interdependent infrastructure networks from observations of cascading failures. Metropolis-Hastings algorithm coupled with the infrastructure-dependent proposal are employed to increase the efficiency of sampling possible graphs. Results of reconstructing a synthetic system of interdependent infrastructure networks demonstrate that the proposed approach outperforms existing methods in both accuracy and computational time. We further apply this approach to reconstruct the topology of one synthetic and two real-world systems of interdependent infrastructure networks, including gas-power-water networks in Shelby County, TN, USA, and an interdependent system of power-water networks in Italy, to demonstrate the general applicability of the approach.


Embedding Heterogeneous Networks into Hyperbolic Space Without Meta-path

Wang, Lili, Gao, Chongyang, Huang, Chenghan, Liu, Ruibo, Ma, Weicheng, Vosoughi, Soroush

arXiv.org Artificial Intelligence

Networks found in the real-world are numerous and varied. A common type of network is the heterogeneous network, where the nodes (and edges) can be of different types. Accordingly, there have been efforts at learning representations of these heterogeneous networks in low-dimensional space. However, most of the existing heterogeneous network embedding methods suffer from the following two drawbacks: (1) The target space is usually Euclidean. Conversely, many recent works have shown that complex networks may have hyperbolic latent anatomy, which is non-Euclidean. (2) These methods usually rely on meta-paths, which require domain-specific prior knowledge for meta-path selection. Additionally, different down-streaming tasks on the same network might require different meta-paths in order to generate task-specific embeddings. In this paper, we propose a novel self-guided random walk method that does not require meta-path for embedding heterogeneous networks into hyperbolic space. We conduct thorough experiments for the tasks of network reconstruction and link prediction on two public datasets, showing that our model outperforms a variety of well-known baselines across all tasks.


EPINE: Enhanced Proximity Information Network Embedding

Zhang, Luoyi, Xu, Ming

arXiv.org Machine Learning

Unsupervised homogeneous network embedding (NE) represents every vertex of networks into a low-dimensional vector and meanwhile preserves the network information. Adjacency matrices retain most of the network information, and directly charactrize the first-order proximity. In this work, we devote to mining valuable information in adjacency matrices at a deeper level. Under the same objective, many NE methods calculate high-order proximity by the powers of adjacency matrices, which is not accurate and well-designed enough. Instead, we propose to redefine high-order proximity in a more intuitive manner. Besides, we design a novel algorithm for calculation, which alleviates the scalability problem in the field of accurate calculation for high-order proximity. Comprehensive experiments on real-world network datasets demonstrate the effectiveness of our method in downstream machine learning tasks such as network reconstruction, link prediction and node classification.


Representation Learning for Heterogeneous Information Networks via Embedding Events

Fu, Guoji, Yuan, Bo, Duan, Qiqi, Yao, Xin

arXiv.org Machine Learning

Network representation learning (NRL) has been widely used to help analyze large-scale networks through mapping original networks into a low-dimensional vector space. However, existing NRL methods ignore the impact of properties of relations on the object relevance in heterogeneous information networks (HINs). To tackle this issue, this paper proposes a new NRL framework, called Event2vec, for HINs to consider both quantities and properties of relations during the representation learning process. Specifically, an event (i.e., a complete semantic unit) is used to represent the relation among multiple objects, and both event-driven first-order and second-order proximities are defined to measure the object relevance according to the quantities and properties of relations. We theoretically prove how event-driven proximities can be preserved in the embedding space by Event2vec, which utilizes event embeddings to facilitate learning the object embeddings. Experimental studies demonstrate the advantages of Event2vec over state-of-the-art algorithms on four real-world datasets and three network analysis tasks (including network reconstruction, link prediction, and node classification).


Variational Bayesian Complex Network Reconstruction

Xu, Shuang, Zhang, Chun-Xia, Wang, Pei, Zhang, Jiangshe

arXiv.org Machine Learning

The networked systems are ubiquitous in many fields, including social-tech science [1, 2], bioinformatics [3-6], epidemic dynamics [7-9] and power grid [10, 11]. However, as is often the case, it is not able to observe the topology of a network, while data generated by this network are available. Therefore, in interdisciplinary science, one of the most important but challenging problems is to reconstruct the complex network from the observed data or time series [12]. This problem has been widely investigated in the past three decades, where the classical method is the delay-coordinate embedding method proposed by Takens [13], which, nevertheless, is only suitable for small-scale networks. Nowadays, with the advent of big data era [14], it is of great urgency solve this issue for large-scale complex networks. Suppose that a complex network consists of N nodes, in practice we are often given the time series of the states for the N nodes. Generally speaking, the core idea of many data-driven network reconstruction investigations is to first calculate the correlation between two nodes. Then, a threshold can be set mutually or automatically to make the network binary.